Some topics on similarity metric learning
نویسنده
چکیده
The success of many computer vision problems and machine learning algorithms critically depends on the quality of the chosen distance metrics or similarity functions. Due to the fact that the real-data at hand is inherently taskand data-dependent, learning an appropriate distance metric or similarity function from data for each specific task is usually superior to the default Euclidean distance or cosine similarity. This thesis mainly focuses on developing new metric and similarity learning models for three tasks: unconstrained face verification, person re-identification and kNN classification. Unconstrained face verification is a binary matching problem, the target of which is to predict whether two images/videos are from the same person or not. Concurrently, person re-identification handles pedestrian matching and ranking across non-overlapping camera views. Both vision problems are very challenging because of the large transformation differences in images or videos caused by pose, expression, occlusion, problematic lighting and viewpoint. To address the above concerns, two novel methods are proposed. Firstly, we introduce a new dimensionality reduction method called Intra-PCA by considering the robustness to large transformation differences. We show that Intra-PCA significantly outperforms the classic dimensionality reduction methods (e.g. PCA and LDA). Secondly, we propose a novel regularization framework called Sub-SML to learn distance metrics and similarity functions for unconstrained face verification and person re-identification. The main novelty of our formulation is to incorporate both the robustness of Intra-PCA to large transformation variations and the discriminative power of metric and similarity learning, a property that most existing methods do not hold. Working with the task of kNN classification which relies a distance metric to identify the nearest neighbors, we revisit some popular existing methods for metric learning and develop a general formulation called DMLp for learning a distance metric from data. To obtain the optimal solution, a gradient-based optimization algorithm is proposed which only needs the computation of the largest eigenvector of a matrix per iteration. Although there is a large number of studies devoted to metric/similarity learning based on different objective functions, few studies address the generalization analysis of such methods. We describe a novel approch for generalization analysis of metric/similarity learning which can deal with general matrix regularization terms including the Frobenius norm, sparse L1-norm, mixed (2, 1)-norm and trace-norm. The novel models developed in this thesis are evaluated on four challenging databases: the Labeled Faces in the Wild dataset for unconstrained face verification in still images; the YouTube Faces database for video-based face verification in the wild; the Viewpoint Invariant Pedestrian Recog-
منابع مشابه
An Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملSimilarity measurement for describe user images in social media
Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملDimensionality Invariant Similarity Measure
This paper presents a new similarity measure to be used for general tasks including supervised learning, which is represented by the K-nearest neighbor classifier (KNN). The proposed similarity measure is invariant to large differences in some dimensions in the feature space. The proposed metric is proved mathematically to be a metric. To test its viability for different applications, the KNN u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015